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MIS 1038-766 1997 11 14 Dl 
TITLE OF IKVENTION 
AliPHAVIRUS VECTORS 

FIELD OF IKVEKTIOW 

The present invention relates to the field of DNA 
vaccines and is particularly concerned with modified 
alpha virus vectors for use in such vaccines. 

BACKGROOMD OF THE IKVEHTION 
Semliki Forest virus (SFV) is a member of the 
Alphavirus genus in the Togaviridae family. The itvature 
virus particle contains a single copy of a ssRNA genome 
with a positive polarity that is 5 • -capped and 3 • - 
polyadenylated. It functions as an mRNA and naked RNA 
can start an infection when introduced into cells. Upon 
infection/transfection, the 5» two-thirds of the genome 
is translated into a polyprotein that is processed into 
the four nonstructural proteins (nsPl to 4) by self 
cleavage. Once the ns proteins have been synthesized 
they are responsible for replicating the plus^strand 
(42S) genome into full-length minus strands (ref * 14) . 
These minus -strands then serve as templates for the 
synthesis of new plus-strand (42S) genomes and the 26S 
subgcnomic mRNA (ref. 1 - Throughout this application, 
various references are cited in parentheses to describe 
more fully the state of the art to which this invention 
pertains. Pi}l 1 bibliographic information for each 
citation is found at the end o£ Lhe specification. The 
disclosures of these references are hereby incorporated 
by reference into the present disclosure) . This 
subgenomic mRNA, which is colinear with the last one- 
third of Uie genome, encodes the SFV structural 
proteins. In 1991 Lil jest rem and Garoff (ref. 2) 
designed a series of expression vectors based on the SFV 
CDNA replicon. These vectors had the virus structural 
protein genes deleted to make the way for heterologous 
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inserts, but preserved the nonstructural coding region 
for production of the nsPl to 4 replicase complex. 
Short 5* and 3' sequence elements required for RNA 
replication were also preserved. A polylinkcr site was 
5 inserted downstream from the 26S promoter followed by 
translation stop sites in all three frames. An Spel 
site was inserted just after the 3' end of the SFV CDNA 
tor linearization of the plasmid for use 1x3 vitro 
transcription reactions. 
10 injection of SFV RNA encoding a heterologous 

protein have been shown to result in the expression of 
the foreign protein and the induction of antibody in a 
number of studies (refs. 3,4). The use of SFV RNA 
inoculation to express foreign proteins for the purpose 
:5 15 of immunization would have several of the advaintages 

3 associated with plasnad DNA immunization. For exan^le, 

SFV RNA encoding a viral antigen may be introduced in 
the presence of antibody to that virus without a loss in 
potency due to neutralization by antibodies to the 
virus- Also, because the protein is expressed In vivo 
the protein should have the same conformation as the 
protein expressed by the virus itself • Therefore, 
concerns about conformational changes which could occur 
during protein purification leading to a loss in 
25 intttiunogenicity, protective epitopes and possibly 
immunopotentiation, could be avoided by plasmid DNA 
immunization . 

In W095 /27 04 4 , the disclosure of which is 
incorporated herein by reference, Lhere is described the 
30 use of alphavirus cDNA vectors based on cDNA 
complementary to the alphavirus RNA sequence. Once 
transcribed from the cDNA under transceptional control 
of a heterologous promoter, the alphavirus RNA is able 
to self-replicate by means of its own replicase and 
35 thereby amplify the copy number of the transcribed 
recombinant RNA molecules. 
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SUMMftRY OF THE IHVEHTIOM 

The present invention is concerned with 
modifications to the alpha varus cDNA vectors described 
in the aforementioned wo 95/27044 to permit enlianced 
5 replication of the alphavirus. in the present 
invention, a heterologous splice site is introduced 
into the alphavirus replicon sequence, particularly 
that of Sentliki Forest virus {ShV) . 

Accordingly, in one aspect, the present invention 
10 provides a cDMA molecule complementary to at least part 
of an alphavirus RNA genome, which cNDA molecule 
comprises the complement of the complete alphavirus RNA 
genome regions which are essential for replication of 
m alphavirus RNA, and further comprises a 

heterologous cdna sequence capable of expression in an 
animal or human host cell, said heterologous cDNA 
sequence being inserted into a region of the cDNA 
molecule which is non-essential to replication thereof, 
and the cDNA molecule being placed under 
20 transcriptional control of a promoter sequence 
functional in said animal or human cell, wherein at 
least one heterologous splice site is provided in the 
complement of the complete alphavirus RNA genome 
regions which are essential for replication of the 
alphavirus RNA, to prevent aberrant RNA splicing of the 
alphavirus . 

The alphavirus molecule is a large molecule and, 
accordingly, there is a high probability of splice 
sites, thereby impairing the replication of the 
30 alphavirus and hence its ability to express the 
heterlogous DNA is impaired. By introducing the at 
least one heterologous splice site in accordance with 
the present invention into the alphavirus replicon 
sequence, any splicing is likely to be directed at the 
heterologous splice site rather than any endogenous 
splice site. 
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in the constructs provided herein, the promoter 
may be directly coupled to the 5' -end of the alphavirus 
sequence, which has the effect of reducing the 
variability in the splicing event at the 5'- end of the 
5 alphavirus replicon. 

in addition, there may be provided at the 3 'end of 
the Simliki Forest virus segment, a hepatitis delta 
ribosyme sequence to ensure proper in vivo cleavage at 
the 3' -end of the sequence. Any other convenient 
10 sequence may be employed to achieve this effect. 

The heterologous splice site sequence may be 
provided by the nucleotide sequence of the rabbit p- 
globin intron II, as described in reference 5. Such 
heterologous splice site sequence may be inserted into 
15 the compleiaent sequence at any convenient location 
which does not preclude replication of the alphavirus. 

I have identified five suitable sites in the SFV 
replicon, which are contained within an EcoRV-Spel 
fragment of the replicon which is 7983 bp in length 
(Fig. 3) . The first such site is a Ppu-MI site, at 
position 2719 within the EcoRV-Spel fragment. 

in constructing the modified vectors provided 
herein, the EcoRV-Spel fragment is cut with Ppu-MI at 
position 2719 and made blunt-ended with Mung Bean 
nuclease, which removes three bases from the SFV 
sequence. A blunt-ended p-globin II intron, which is 
536 bp long, is ligated into the site and replaces the 
missing three bases vith sequence added to the 3' -end 
of the p-globin intron sequence (fig. 1) • 
30 The other four suitable sites for insertion of the 

intron arc the PvuII sites at bp 2S18 3113, 6498 and 
6872 of the EcoRV-Spel fragment- insertion of the 
Intron is achieved by cutting with Pvull (a blunt end 
cutter) and the blunt-ended p-g1obtn II intron sequence 
35 (Fig. 2) is ligated into one or more of these sites. 
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BRIEF DSSCRIPTIOK OF DRAyiWGS 

Figure 1 shows the HNA sequence of the p-globin TT 
intron encoding three additional nucleotides at the 3'- 
5 end thereof (SEQ ID No:l); 

Figure 2 shows the DNA sequence of the P-globin II 
intron (SEQ ID No: 2); and 

Figures 3A to 3E show the DKA sequence of the 
EcoRV-Spel fragment of Semliki Forest virus replicon 
10 (SEQ ID No:3) . 



GENERAL DBSCRIPTIOW Or INVENTION 

As discussed above, the present invention provides 
a modified alphavirus cUNA. The alphavirus preferably 
is Senaiki Forest virus. 

The promoter sequence may comprise a promoter of 
eukaryotic or prokaryotic origin. Suitable promoters 
are the cytomegalovirus immediate early promoter 
(pCMV), although other promoters, such as the Rous 
sarcoma virus long-terminal repeat promoter (pRSV) , 
since, in the case of these and similar promoters, 
transcription is performed by the DNA-dependent RNA 
polymerase of the host cell. Additionally, the SP€, T3 
or T7 promoters can be used, provided that the cell has 
first been transformed with genes encoding SP6, T3 or 
T7 RNA polymerase molecules which are either inserted 
into the chromosome or remain episomal. Expression of 
these (SP6, T3, T7} RNA polymerase -encoding genes is 
dependent on the host cell DNA-dependent RNA 
polymerase. 

The heterologous cDNA insert may comprise the 
coding sequence for a desired product, which may be a 
biologically active protein or polypeptide, e.g., an 
immunogenic or antigenic protein or polypeptide, or a 
therapeutically active protein or polypeptide. The 
heterologous cDNA may also comprise additional 
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sequences, such as a sequence complementary to an RKA 
sequence which is a self -cleaving ribozyme sequence. 

The DNA vectors provided herein may be 
adn^inistered to a host, including a human host, for in 
vivo expression of the heterologous cDNA sequence, in 
accordance with a further aspect of the invention in 
order to generate an immune response in the host, which 
be a protective innnune response. The DNA vectors 
may be further formulated into immunogenic compositions 
for such administration. 

SOMMMtY or DISCLOSUBK 

in summary of this disclosure, the present 
invention provides a modified alphavirus-baacd 
expression vector wherein at least one splice sxte is 
introduced to the alphavirus replicon to prevent 
aborrant splicing of the alphavirus genome. 
Modifications are possible within the scope of the 
invention. 
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ABSTRACT OF THB PISC1<0S0BE 

A modified alphavirus expression vector is 
provided wherein at least one heterologous splice site 
is introduced to the alphavirus replicon to prevent 
aberrant splicing of the alphavirus, which may be 
Semliki Forest virus following administration of the 
vector to a host. 
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10 

ATCGGCAGTG 
7D 

ATGCGCAGCG 
X30 

TCCGGGAAGG 
190 

GCTACGCCAG 
250 

GCAGCCGAAG 
910 

CATCAGGCGA 
370 

ATGTTTGACG 
430 

GTQTTACAOO 
490 

AAACXX5TCCA 
550 

GGATCTACAT 
610 

TTCCACCTGA 
670 

GGGTACX3TAG 
730 

GQCGTGACGT 
790 

gMagagtct 

U 850 

ggsIatactag 

iJI 910 
CftGAGGATAO 
B 970 

cirarccGATTG 

1030 
G^^JGAAAAAC 
U 1090 
Ai^OGAGGA 
1150 
CCjrCAGAGT 
1210 
GT^AGATCAC 
1270 
CTCGACGCGT 
1330 
CTGACTAGAG 
1390 
GACGTOCSACG 
1450 
AGCGCGTTGA 
1510 
TCCCCX3CAGA 
1570 
GTGAAAA^TAA 
1630 
AGGGTCCTAC 
1690 
AfiCGCCACTA 
1750 



20 

CGCCTTCCAG 
fiO 

CAGAAGACCC 
140 

TGCTGGATAG 
200 

ACGCTGAATC 
260 

TGGCCGTATA 
320 

TGAAAGGTGT 
360 

OGCTAGCAGG 
440 

CCAGGAACAT 
500 

TTCTCCGCAA 
560 

TGTACACTGA 
620 

AAGGTAAACA 
680 

TTAAGAAAAT 
740 

ATCAC3GCGGA 
800 

CATTCXXrrGT 
860 

CGACCGACX3T 
920 

TTGTGAACGG 
980 

TGGCCGTCGC 
1040 
CTCTGGGTGT 
1100 
AGATGCACAC 
1160 
TTAACTOGTT 
1220 
GCATTAAGAT 
1280 
CGTCAGCCAG 
1340 
AAGCJCXTACC 
1400 
TTGAAGAACT 
1460 
AAGTCACCGC 
1520 
CCGTGCTCAA 
1580 
TAACACATAA 
164 0 
TACCATGTGG 
1700 
TGGTGTACAA 
1760 
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30 
90 

CGAAAGGCTC 
150 

AGAGATCGCA 
210 

TCCTACCTTT 
270 

CCAGGACGTG 
330 

CAGAACGGCX? 

390 

CGCGTATCCA 
450 

AGGACTGTGT 
510 

GAAGCAATTG 
570 

GAGCAGAAAG 
630 

ATCCTTTACC 
690 

CACTATGTGC 
750 

GGGATTCCTA 
BIO 

ATGCACCTAC- 
870 

CACACCGGAG 
930 

AAGAACACAG 
990 

ATTTAGCAAG 
1050 
CCGAGAGAGG 
1110 
CATGTACAAG 
1170 
CGTCATCCCG 

1230 

GCTTTTGGCC 
1290 
GGATGCTGAA 
1350 
ACCCCTCGTC 
1410 
AGAGTATCAC 
1470 
ACA6CCGAAC 
1530 
GAGCTCCAAG 
1590 
OGGGAGGGCC 
1650 
ATCGGCCATT 
1710 
CGAAAGGGAG 
1770 



40 

TCTACGCACA 
100 

GATAGCTACG 
160 

GGAAAAATCA 
220 

TGCCTGCATA 
280 

TATGCTGTAC 
340 

TATTGGATTG 
400 

ACCTACGCCTi 
460 

GCAGCATCCT 
520 

AAACCTTGCG 
580 

CTACTGAGGA 
640 

TGTAGGTGCG 
700 

CCCGGCCTGT 
760 

GTCTGCAAGA 
820 

• GTCCCCTCAA 
880 

GACOCACAGA 
940 

CGAAACACTA 
1000 
TGGGCGAOGG 
1060 
TCACTTACTT 
1120 
AAACCAGACA 
1180 
AGCCTATGGT 
1240 
AAGAAGACCA 
1300 
CAAGAGGAGA 
1360 
CCCATCGCGC 
1420 
GCAGGTGCa^G 
1480 
GACGTACTAC 
1S40 
TTGGCCCCCG 
1600 
GGCGGTTACC 
1660 
CCGGTCCCTO 
1720 
TTOGTCAACA 
1780 



50 

AATACCACTG 
110 

CAAAGAAACT 
170 

COGACCTGCA 
230 

CAGAOGTCAC 
290 

ATGCACCAAC 
350 

GGTTTGACAC 
410 

CHAACTGGGC 
470 

TGACTGAGGG 
530 

ACACAGTCAT 
590 

GCTGGCACTT 
650 

ATACCATOGT 
710 

ACGGTAAAAC 
770 

CCACAGACAC 
830 

CCATCTGTQA 
890 

AGTTGrXAGT 
950 

ACACGATGAA 
1010 
AATACAAGGC 
1070 
GCTOCTGCTT 
1130 
CCCAGACAAT 
1190 
CTACAGGCCT 
1250 
AGCGAGAGTT 
1310 
AGGAGAGGTT 
1370 
CGGCGGAGAC 
1430 
GGGTCGTGGA 
1490 
TAGGAAATTA 
1550 
TGCACCCTCT 
1610 
AGGTCGACGG 
1670 
AGTTTCAAGC 
1730 
GGAAACTATA 
1790 



CGTATG^Pl 

GGCAGCGGCC 
180 

GACCGTCATG 
240 

GTGTCGTAOG 
300 

AT06CTGTAC 
360 

CACCCCGTTT 
420 

CGACGAGCAG 
480 

AAGACTCGGC 
540 

GTTCTCGGTA 
600 

ACCCTCCOTA 
660 

ATCATGTGAA 
720 

GGTAGGGTAC 
780 

TGTCAAAGGA 
840 

TOUATGACT 
900 

GGGATTGAAT 
960 

GAACTATCTG 
1020 
AGACCTTGAT 
1060 
GTGGGCATTT 
1140 
AGTGAAGGTG 
1200 
CGCAATCCCA 
1260 
AATACCTGTT 
1320 
GGAGGCCGA6 
1380 
GGGAGTOGTC 
1440 
AACACCTCGC 
1500 
CGTAGTTCTG 
1560 
AGCAGAGCAG 
1620 
ATATGACX3GC 
1680 
TTTGAGCGAG 
1740 
CCATATTGCC 
1800 
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GTTCACGGAC 

1810 
ACTGACGCC6 

1870 
TCGGGTTT6G 

1930 
GGGCTGAAGA 

1990 
CCGGGATCAG 

2050 
AGGGGCAAGA 

2110 
GGOACAAGXA 

2170 
ATCCTATATG 

2230 
CTTGTTAAAC 

2290 
AATATGATQC 

2350 
AGTATATCCA 

2410 
GGCAA6ATGC 

2470 
ACCAAGCCCA 

,Ti 2530 
CiSTTGGACT 

:i 2590 
A^GOGGTAT 

92 2650 
GAGCACGTGA 

SI 2710 
GC^GATCCCT 

\JJ 2770 
GAAGAATGGC 

M 2830 
GIBGACGCGT 

U 2890 
gA0ACTGCCG 

% 2950 
G^i^ACAGAG 
3010 

gttgacx:tgg 

3070 

cactgggata 

3130 

ctggaagcta 

3190 

gcagaaagaa 

3250 

ctgccgcacg 

3310 

gtcaataaag 

3370 

cgacgcaggg 

3430 

ctaagtttag 

3490 
ACGGAATTCA 
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CGTCGCTGAA 
1820 

AGTACGTGTT 
1880 

tottggtggg 

1940 
TCAGGCCGTC 

2000 
GCAAGTCTGC 

2060 
AGGAGAACTG 

2120 
GGGAAAACAG 

2180 
TGGACGAGGC 

2240 
CTOGGAGOAA 

2300 
AGCTTAAGGT 

2360 
GACGTTGCAC 

2420 
GCACGACCAA 

2480 
AGCCAGGAGA 

2540 
ACCGTGGACA 

2600 
ACGCCGTAAG 

2660 
ATGTACTGCT 

2720 
GGATTAAGGT 

2780 
AAGAAGAACA 

2840 
TCCAQAACAA 

2900 
GAATCAGATT 

2960 
CTTACTCTCC 

3020 
ACAGTG6CCT 

3080 
ACAGACCTGG 

3140 
GACATACCTT 

3200 
AAATCCAACC 

3260 
CCCTGGTGGC 

3320 
TAAGAGG6TA 

3380 
TCACTTGGTT 

3440 
GACTGCCGGC 

3500 
GAATCCACCA 



CAgWACGAG 
^■l830. 

CGJ^CTAGAT 

1890 
A6AGCTAACC 

1950 
GGCACCATAT 

2010 
TATTATTAAG 

2070 
CCAGGAAATA 

2130 
TGACTCCATC 

2190 
TTTCGCTTGC 

2250 
AGTGOTGTTA 

2310 
GAACTTCAAC 

2370 
GCGTCCAGTC 

2430 
CCCGTGCAAC 

2490 
CATCGTGTTA 

2550 
CGAA6TCATG 

26X0 
GCAGAAGGTG 

2670 
GACGCGCACT 

2730 
CCTATCAAAC 

2790 
CGACAAAATA 

2850 
AGCGAACGTG 

2910 
GACAGCAGAG 

2970 
AGTGGTGGCC 

3030 
GTTTTCTGCC 

3090 
TGGAAGGATG 

3150 
CCTGAAGGGG 

3210 
GCTTTCTGTG 

3270 
TGAGTACAAG 

3330 
CCACGTCCTG 

3390 
GTCACCGCTG 

3450 
TGACGCCGGC 

3S10 
CTACCAGCAG 



GAGAACTACG 
' 1840 
AAAAAATGCT 

1900 
AACCCCCCGT 

1960 
AAGACTACAG 

2020 
AGCCTCGTGA 

2080 
GTTAACGACG 

2140 
CTGCTAAACG 

2200 
CATTCCXSOTA 

2260 
TGCGGAGACC 

2320 
CACAACATCT 

2380 
ACGGCCATCG 

2440 
AAACCCATAA 

2500 
ACATGCTTCC 

2560 
ACAGCAGCAG 

2620 
AATGAAAATC 

2680 
GAGGAXAGGC 

2740 
A1-1*CCACAGG 

2800 
ATGAAGGTGA 

2860 
TGTTGGGCGA 

2920 
GAGTGGAGCA 

2980 
TTGAATGAAA 

3040 
CCGAAGGTGT 

3100 
TATGGATTCA 

3160 
CAGTGGCATA 

3220 
CTGGACAATG 

3280 
ACGGTTAAAG 

3340 
CTGGTGAOTG 

3400 
AATGTCACAG 

3460 
A6GTTCGACT 

3520 
TGTGTCGACC 



AGAAAGTCAG 

1850 
GCGTCAAGAG 

1910 
TCCATGAATT 

1970 
TAGTAGOAGT 

2030 
CCAAACACGA 

2090 
TGAAGAAGCA 

2150 
GGTGTCGTCG 

2210 
CTCTGCTGGC 

2270 
CCAAGCAATG 

2330 
GCACTGAAGT 

2390 
TGTCTACGTT 

2450 
TCATAGACAC 

2510 
GAGGCTGGGC 

2570 
CATCTCAGGG 

2630 
CCTTGTATGC 

2690 
TGGTGTGGAA 

2750 
GTAACTTTAC 

2810 
TTGAAGGACC 

2870 
AAAGCCTGGT 

2930 
CCATAATTAC 

2990 
TTTGCACCAA 

3050 
CCCTGTArrA 

3110 
ATGCCGCAAC 

3170 
CGGGCAAGCA 

3230 
TAATTCCTAT 

3290 
6CAGTAGGGT 

3350 
AQTACAACCT 

3410 
GCGCCGATAG 

3470 
TGG'l-CTTTGX 

3530 
ACGCCATGAA 



AGd^^^A 
^R60 

AGAGGAAGCG 
1920 
OGCCTACGAA 
1980 
CTTTGGOGTT 

2040 
TCTGGTCACC 

2100 
CCGCGGGAA6 

2160 
TGCCGTGGAC 

2220 
CCTAATTGCT 

2280 
CGGATTCTTC 

2340 
ATGTCATAAA 

2400 
GCACTACGGA 

2460 
CACAGGACAO 

2520 
AAAGCAGCTG 

2580 
CCTCACCCGC 

2640 
CCCTGCGTCG 

2700 
AACGCTGGCC 

2760 
GGCCACATTG 

2820 
GGCTGCGCCT 

2880 
GCCTGTCCTG 

2940 
AOCATTTAAG 

3000 
GTACTATGGA 

3060 
CGAGAACAAC 

3120 
AGCTC3CCAGG 

3X80 
GGCAGTTATC 

3240 
C7JVCCGCAGG 

33O0 
TGAGTGGCTG 

3360 
GGCTTTGCCT 

3420 
GTGCTACGAC 

3480 
GAACATTCAC 

3540 
GCTQCAGATG 
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3550 
CTTGGGGGAG 
. 3 610 
TACGCCGATA 

3€70 
AGAGTGTTGC 

3730 
TTTGACAACG 

3790 

TATGCCGGAG AAGCCATGCA 
3850 3860 
GACATAGCCA CGTGCACAGA 
3510 3920 
GGGGATGGCO TATGCAGGGC 
3970 3960 
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3560 ^^P^ 3580 

ATCCGCTACG ACTgI^BVA. CCCGGCGGCA 
3620 • 3630 3640 

AAATCAGCGA AGCCGTTGTT TCCTCCrrAA 



3680 

GCCC5GGATTC 
3740 
GAAAGAGACC 
3800 



3690 
TGTCACCAGC 
3750 
CTCTACGCTA 
3810 



3700 
AATACAGAAG 
3760 
CACCAGATGA 
3820 



CACGGCCGGG TGTQCACCAT 
3870 3880 
AGCGGCTGTG GTTAACGCAG 
3930 3940 
CGTGGCGAAG AAATGGCCGT 
3990 4000 



ACACXAGTGG GCACAATTAA AACAGTCATX3 TCCGGCTCGT 



4030 
GCGCCTAATT 
4090 
CGGGCAGTGG 
41S0 
TCCACAGGAG 
4210 
ACAGCAATGG 
4270 
AA6AAAATCC 
4330 
Gf^CTGACCA 

:i 4390 4400 
T^^AGTACCA CTGACGGGTC 
l^l 4450 4460 
GCTATTGATA TGGCAGAQAT 
4510 4520 

Ar&rrGCCTAT acgcgctggg 

- 4570 4S80 
GfiSTCCGATT CATCAACACC 
i*^ 4630 4640 
GCAGAAOGGA TCGCCCGCCT 
1= 4690 4700 



4040 4050 4060 

TCTCTGCCAC GACTGAAGCG GAAGGGGACC 
4100 4110 4120 

CCOCCCSAAGT AAACT^CTG TCACTGAGCA 
4160 4170 4180 

TGTTCAGCGG CGGAAGAGAT AGGCTGCAGC 
4220 4230 4240 

ACGCCACGGA CGCTGACGTG ACCATCTACT 
4280 4290 4300 

AGGAAGCCAT TGACATGAGG ACG6CTGTGG 
4340 4350 4360 

CAGACTTGGT GAGAGTGCAC CCGGACAGCA 
4410 4420 
GCTGTACTCG TACTTTGAAG 
4470 4480 
ACTGACGTTG TGGCCCAGAC 
4530 4540 
COAAACAATG 6ACAACATCA 
4590 4600 
TCCCAGGACA GTGCCCTCCC 
4650 4660 
TAGGTCACAC CAAGTTAAAA 
4710 4720 



TTgCCCCTCC CGAAATACCA TGTAGATCGG GTGCAGAAGG 



4750 

ctSttcgacc 

4810 
GACCACTCAG 
4870 
ACTGCCAGCG 
4930 
GAGCCAATGG 
4990 
GACCTGGCGG 
5050 
CCTCCACCGC 
5110 



4760 4770 4780 

CGACGGTACC TTCAGTGGTT AGTCCGCGGA 



4820 4830 
ATCGGTCGTT AOGAGGGTTT 
4880 4890 
ATACCATGTC QCTACCCAGr 
4940 4950 
CTCCCATAGT AGTGACGGCT 
SOOO 5010 
CAGATGTGCA CCCTGAACCC 
5060 5070 
6CCCGAAGAG AGCTGCATAC 
5120 5130 
CCGGCGCCGA GAAAGCCGAC GCCTGCCCCA AGGACTGCGT 
5170 5180 5190 5200 

ACGTTCGGCG ACTTTGACGA GCACGAGGTC GATGCGTTGG 
5230 5240 5250 5260 

GACTTCGACG ACGTCCTGCG ACTAGGCCGC GCGGGTGCAT 
5290 5300 5310 5320 



4840 

GACTTGGACT 
4900 
TTGCAGTCGT 
4960 
GACGTACACC 
5020 
GCAGACCATG 
5080 
CTTGCCTCCC 
5140 



3590 
TCTTGATGAG 
3650 
GCAGAAAGTT 
3710 
TGTTCTTGCT 
3770 
ATACCAAGCT 
3830 
CCTACAGAGT 
3890 
CTAACGCCOG 
3950 
CAGCCTTTAA 
4010 
ACCCCGTCAT 
4070 
GCGAATTGGC 
4130 
GCGTAGCCAT 
4190 
AATCCCTCAA 
4250 
GCAGAGACAA 
4310 
AGTTGCTCAA 
4370 
6CCTGGTGGG 
4430 
GTACGAAATT 
4490 
TGCAAGAGGC 
4550 
GATCCAAATG 
4610 
TGTGCCGCTA 
4670 
GCATGGTGGT 
4730 
TAAAGTGCGA 
4790 
AGTATGCCGC 
4850 
GGACCACCGA 
4910 
GTGACATCGA 
4970 
CTGAACCCGC 
5030 
TGGACCTCGA 
5090 
GCCCGGCGGA 
5150 
TTAGGAACAA 
5210 
CCTCCGGGAT 
5270 
ATATTTTCTC 
5330 



AGCTTA 



CTCGTCTGCA 
3720 
GTTCTCCAAC 
3780 
GAGTGCCGTG 
3840 
TAAGAGAQCA 
3900 
TGGAACTGTA 
3960 
GGGAGCAGCA 
4020 
CCACGCTGTA 
4080 
CGCTGTCTAC 
4140 
CCOGCTGCTG 
4200 
CCATCTATTC 
4260 
AAGTTGGGAG 
4320 
TGATCACGTG 
4380 
TCGTAAGGGC 
4440 
CAACCAGGCT 
4500 
AAACGAACA6 
4560 
TCCGGTGAAC 
4620 
CGCAATGACA 
4680 
TTGCTCATCT 
4740 
GAAGGTTCTC 
4800 
ATCTACGAOG 
4860 
CTCGTCTTCC 
4920 
CTCGATCTAC 
4980 
AGGCATCGCG 
5040 
GAACCCGATT 
5100 
GCGACCGGTG 
5160 
GCTGCCTTTG 

5220 

TACTTTCGGA 
5280 
CTCGGACACT 
5340 
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GGCAGCGGAC ATTTACAACA Ai^^JCGTT AGGCAGCACA ATCTCCAGTG CGCA^^TG 

5350 5360 ^^537a . 5380 5390 ^400 

GATQCGGTCfC AGGAGGAGAA AATGTACCCG CCAAAATTGG ATACTGAGAG GGAGAAGCTG 

5410 5420 5430 5440 5450 5460 

TTGCTGCTGA AAATGCAGAT GCACCCATCG GAGGCTAATA AGAGTCGATA CCAGTCTCGC 

5470 5480 5490 5500 SSlO 5520 

AAAGTGGAGA ACATGAAAGC CACGGTGGTG OACAGGCTCA CATCGGGGGC CAGATTGTAC 

5530 5540 5550 5560 5570 5580 

ACGGGAGCGG ACGTAGGCCG CATACCAACA TACGCGGTTC GGTACCCCCG CCCCGTGTAC 

5590 5600 5610 5620 5630 5640 

TCCCCTACCG TGATCGAAAG ATTCTCAAGC CCCGATGTAG CAATCGCAGC GTCCAACGAA 

5650 5660 5670 5680 5690 5700 

TACCTATCCA GAAATTACCC AACAGTGOCG TCGTACCAGA TAACAGATGA ATACGACGCA 

5710 5720 5730 5740 5750 5760 

TACTTGGACA TGGTTGACGG GTCGGATAGT TGCTTGGACA GAGCGACATT CTGCCOGGOG 

S770 5780 5790 5800 5810 5820 

AAGCTCCX3GT GCTACCCGAA ACATCATGCG TACCACCAGC CGACTGTACG CAGTGCCGTC 

5830 5840 5850 5860 5870 5880 

CC6TCACCCT TTCAOAACAC ACTACAGAAC GTGCTAGCGG COGCCACCAA GAGAAACl^C 

5890 5900 5910 5920 5930 5940 

AACGTCACGC AAATGCGAGA ACTACCCACC ATGGACTCGG CAGTGTTCAA CGTGGAGTGC 

5950 5960 5970 5980 5990 6000 

TTCAAGCGCT ATOCCTGCTC CGGAGAATAT TGGGAAGAAT ATGCTAAACA ACCTATCOGG 

6010 6020 6030 6040 6050 6060 

ATAACCACTG AGAACATCAC TACCTATGTG ACCAAATTGA AAGGCCCGAA AGCTGCTGCC 
^ 6070 6060 6090 6100 6110 6120 

•r^TTCGCTA AGACCCACAA CTTGGTTCXX; CTGCAGGAQG TTCCCATGGA CAGATTCAOG 
Q 6130 6140 6150 6160 6170 6180 

G^eGACATGA AACGAGATGT CAAAGTCACT CCAGGGACGA AACACACAGA GGAAAGACCC 
^ 6190 6200 6210 6220 6230 6240 

AAAGTCCAGG TAATTCAAGC AGC6GAGCCA TTGGCGACCG CTXACCTGTG CGGCATCXaiC 
%| 6250 6260 6270 6260 6290 6300 

A^GAATTAG TAAGGAQACT AAAIXSCIXSTG TTACGCCCTA ACGTGCACAC ATTGTTTGAT 
ij} 6310 6320 6330 6340 6350 6360 

AgraTCGGCOG AAGACTTTGA CGCGATCATC GCCTCTCACT TCCACCCAGG AGACCCGGTT 

6370 6380 6390 6400 6410 6420 

da^AGACGG ACATTGCATC ATTC6ACAAA AGCCAGGACG ACTCCTTGGC TCTTACAGCrr 

6430 6440 6450 6460 6470 6480 

Tt^ATGATCC TCGAAGATCT AGGGGTGGAT CAGTACCTGC TGGACTTGAT CGAGGCAGCC 

6490 6500 6510 6520 6530 6540 

T^^GGGGAAA TATCCAGCTG TCACCTACCA ACTGGCACGC GCTTCAAGTT OGGAGCTATG 

6550 6560 6570 6580 6590 6600 

ATGAAATCG6 GCATCTTTCT GACTTTCTTT ATTAACACTQ TTTTGAACAT CACCATAGCA 

6610 6620 6630 6640 6650 6660 

AGCAGGGTAC TGGAGCAGAG ACTCACTGAC TCCGCCTGTG OGGCCTTCAT CGGCGACGAC 

6670 6680 6690 6700 6710 6720 

AACATOGTTC ACGGAGTGAT CTCCGACAAG CTGATGGCGG AGAGGTGCGC GTCGTGGGTC 

6730 6740 6750 6760 6770 6780 

AACATGGAGG TGAAGATCAT TGACGCTGTC ATGGCCGAAA AACCCCCATA TTTTTGTGGG 

6790 6800 6810 6820 6830 6840 

GGATTCATAG TTTTTGACAG CGTCACACAG ACCGCCTGCC GTGTTTCAGA CCCACTTAAG 

6850 6860 6870 6880 6890 6900 

CGCCTGTTCA AGTTGGGTAA GCCOCTAACA GCTGAAGACA AGCAGGACGA AGACAGGCGA 

6910 6920 6930 6940 6950 6960 

CGAGCACTGA GTGACQAGGT TAGCAAGTGG TTCCGGACAG GCTTGGGGGC CGAACTGGAG 

6970 6980 6990 7000 7010 7020 

GTGGCACTAA CATCTAGGTA TGAGGTAGAG GGCTGCAAAA GTATCCTCAT AGCCATGGCC 

7030 7040 7050 7060 7070 7080 

ACCTTGGCOA GGGACATTAA GGCGTTTAAG AAATTGAGAG QACCTGTTAT ACACCTCTAC 
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7090 


7100 




7120 


7130 


71^ 


GGCGGTCCTA 


GATTGGTCCG 


TIAATAcBK 


jGAATTCTGAT 


tggatcccx;g 


GTAATTAA<^ 


. 7150 


7160 


- 7170 


7180 


7190 


7200 


GAATTACATC 


CCTACGCAAA 


CGTTTTACGG 


CCGCCQGTGG 


cgcccgcgcc 


CGGCGGCCCG 


7210 


7220 


7230 


7240 


7250 


7260 


TCCTTOGCCG 


TTGCAGGCCA 


CTCCGGTGGC 


TCCCGTCGTC 


CCCGACTTCC 


AGGCCCAGCA 


7270 


7280 


7290 


7300 


7310 


7320 


G2VT6CAGCAA 


CTCATCAGCG 


CCGTAAATGC 


GCTGACAATG 


AGACAGAAOG 


CAATTQCTCC 


7330 


7340 


7350 


7360 


7370 


7380 


TGCTAGGCCr 


CCCAAACCAA 


AGAAGAAGAA 


GACAACCAAA 


CCAAAOCCGA 


AAACGCAGCC 


7390 


7400 


7410 


7420 


7430 


7440 


CAAGAAGATC 


AA06GAAAAA 


OGCAGCAGCA 


AAAGAAG&;U\ 


GACAA6CAAG 


CCGACAAGAA 


7450 


7460 


7470 


7480 


7490 


7500 


GAAGAAGAAA 


CCCGGAAAAA 


GAGAAAGAAT 


GTGCATGAAG 


ATTGAAAATG 


ACTGIATCTT 


7510 


7520 


7530 


7540 


7550 


7560 


CGTATGCOGC 


TAGCCACAGT 


AACGTAGTGT 


TTCCAGACAT 


GTCGGGCACC 


GCACTATCAT 


7570 


7580 


7590 


7600 


7610 


7620 


GGGTGCAGAA 


AATCTCG<5GT 




CCTTCGCAAT 


CGGCGCTATC 


CTGGTGCTGG 


7630 


7640 


7650 


7660 


7670 


7680 


TTGTGGTCAC 


TTGCATTGGG 


CTCCGCAGAT 


AAGTTAGGGT 


AGGCAATGGC 


ATTCATATAG 


7690 


7700 


7710 


7720 


7730 


7740 


CAAGAAAATT 


GAAAACAGAA 


AAAGTTAGGG 


TAAGCAATGG 


CATATAACCA 


TAACTGTATA 


7750 


7760 


7770 


7780 


7790 


7800 


ACTTGTAACA 


AAGCGCJACa 


AGACCTGCGC 


AATTGGCCCC 


GTGGTCCGCC 


TCACGGAAAC 


7810 


7820 


7830 


7840 


7850 


7860 


TQ3GGGCAAC 


TCATATTGAC 


ACATTAATTG 


GCAATAATTG 


GAAOCTTACA 


TAAGCTTAAT 


•JJ 7870 


7B80 


7890 


7900 


7910 


7920 


T^ACGAATA 


ATTC3JGAr'rrT 


TATTTTATTT 


TGCAATTGGT 


TTTTAATATT 


TCCAAAAAAA 


O 7930 


7940 


7950 


7960 


7970 


7980 


Aj^AAAAAAA 


AAAAT^AAAAA 


AAA7UVAAAAA 


AAAAAAAAAA 


AAAAAAAAAA 


AAAAAAAAAA 


1/1 7990 


8000 


8010 


8020 


8030 


8040 
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